Large-scale Submodular Greedy Exemplar Selection with Structured Similarity Matrices

نویسندگان

  • Dmitry Malioutov
  • Abhishek Kumar
  • Ian En-Hsu Yen
چکیده

Exemplar clustering attempts to find a subset of data-points that summarizes the entire data-set in the sense of minimizing the sum of distances from each point to its closest exemplar. It has many important applications in machine learning including document and video summarization, data compression, scalability of kernel methods and Gaussian processes, active learning and feature selection. A key challenge in the adoption of exemplar clustering to large-scale applications has been the availability of accurate and scalable algorithms. We propose an approach that combines structured similarity matrix representations with submodular greedy maximization that can dramatically increase the scalability of exemplar clustering and still enjoys good approximation guarantees. Exploiting structured similarity matrices within the context of submodular greedy algorithms is by no means trivial, as naive approaches still require computing all the entries of the matrix. We propose a randomized approach based on sampling sign-patterns of columns of the similarity matrix and establish accuracy guarantees. We demonstrate significant computational speed-ups while still achieving highly accurate solutions, and solve problems with up-to millions of data-points in around a minute or less on a single commodity computer.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Submodular Cover: Succinctly Summarizing Massive Data

How can one find a subset, ideally as small as possible, that well represents a massive dataset? I.e., its corresponding utility, measured according to a suitable utility function, should be comparable to that of the whole dataset. In this paper, we formalize this challenge as a submodular cover problem. Here, the utility is assumed to exhibit submodularity, a natural diminishing returns condit...

متن کامل

Fast Multi-stage Submodular Maximization

Motivated by extremely large-scale machine learning problems, we introduce a new multistage algorithmic framework for submodular maximization (called MultGreed), where at each stage we apply an approximate greedy procedure to maximize surrogate submodular functions. The surrogates serve as proxies for a target submodular function but require less memory and are easy to evaluate. We theoreticall...

متن کامل

Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets

To cope with the high level of ambiguity faced in domains such as Computer Vision or Natural Language processing, robust prediction methods often search for a diverse set of high-quality candidate solutions or proposals. In structured prediction problems, this becomes a daunting task, as the solution space (image labelings, sentence parses, etc.) is exponentially large. We study greedy algorith...

متن کامل

Fast Multi-Stage Submodular Maximization: Extended version

Motivated by extremely large-scale machine learning problems, we introduce a new multistage algorithmic framework for submodular maximization (called MultGreed), where at each stage we apply an approximate greedy procedure to maximize surrogate submodular functions. The surrogates serve as proxies for a target submodular function but require less memory and are easy to evaluate. We theoreticall...

متن کامل

Submodular Maximization and Diversity in Structured Output Spaces

We study the greedy maximization of a submodular set function F : 2 → R when each item in the ground set V is itself a combinatorial object, e.g. a configuration or labeling of a base set of variables z = {z1, ..., zm}. This problem arises naturally in a number of domains, such as Computer Vision or Natural Language Processing, where we want to search for a set of diverse high-quality solutions...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016